Trustworthiness in Large Language Models (LLMs)

Building trustworthy LLMs is crucial for applications in sensitive areas like healthcare and finance. While models like ChatGPT generate human-like responses, they don't always guarantee trustworthiness in areas like truthfulness, safety, and privacy.

A recent study by Sun et al. (2024) explores the topic of trustworthiness in LLMs. It covers challenges, benchmarks, and future research directions.

One of the biggest hurdles to using current LLMs in real-world applications is ensuring they are trustworthy. The study proposes 8 principles for trustworthy LLMs, evaluated across 6 key areas: truthfulness, safety, fairness, robustness, privacy, and machine ethics.

Trustworthiness Benchmark
The authors also created a benchmark to test LLMs on trustworthiness. Here’s what the study found when evaluating 16 popular LLMs using over 30 datasets:

1. Proprietary models (like GPT-4) generally perform better than open-source ones on trustworthiness.
2. Open-source models (like Llama 2) are improving and performing closely to proprietary models, especially without special moderation tools. However, models like Llama 2 are sometimes too cautious, misinterpreting harmless prompts as harmful.
3. Key Insights:
   - Truthfulness: LLMs often struggle with accuracy due to outdated or incorrect training data, but models connected to external knowledge sources do better.
   - Safety: Open-source models lag in safety, especially in handling misuse or harmful content.
   - Fairness: LLMs don’t handle stereotypes well, even advanced models like GPT-4 only have around 65% accuracy in this area.
   - Robustness: LLMs show mixed results when dealing with unexpected or out-of-scope tasks.
   - Privacy: Most models understand privacy concerns but handle sensitive data inconsistently, with some even leaking information in tests.
   - Machine Ethics: While LLMs can follow basic moral guidelines, they struggle with complex ethical issues.

Trustworthiness Leaderboard
The study also published a leaderboard ranking LLMs based on their trustworthiness in different areas. A GitHub repository is available for anyone who wants to test LLMs for trustworthiness themselves.
